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Abstract 

Particle  splitting  methods  are  considered  for  the  estimation  of  rare 
events.  The  probability  of  interest  is  that  a  Markov  process  first  enters 
a  set  B  before  another  set  A,  and  it  is  assumed  that  this  probability 
satisfies  a  large  deviation  scaling.  A  notion  of  subsolution  is  defined 
for  the  related  calculus  of  variations  problem,  and  two  main  results 
are  proved  under  mild  conditions.  The  first  is  that  the  number  of  par¬ 
ticles  generated  by  the  algorithm  grows  subexponentially  if  and  only 
if  a  certain  scalar  multiple  of  the  importance  function  is  a  subsolu¬ 
tion.  The  second  is  that,  under  the  same  condition,  the  variance  of 
the  algorithm  is  characterized  (asymptotically)  in  terms  of  the  subso¬ 
lution.  The  design  of  asymptotically  optimal  schemes  is  discussed,  and 
numerical  examples  are  presented. 


1  Introduction 

The  numerical  estimation  of  probabilities  of  rare  events  is  a  difficult  problem. 
There  are  many  potential  applications  in  operations  research  and  engineer¬ 
ing,  insurance,  finance,  chemistry,  biology,  and  elsewhere,  and  many  papers 
(and  by  now  even  a  few  books)  have  proposed  numerical  schemes  for  partic¬ 
ular  settings  and  applications.  Because  the  quantity  of  interest  is  very  small, 
standard  Monte  Carlo  simulation  requires  an  enormous  number  of  samples 
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for  the  variance  of  the  resulting  estimate  to  be  comparable  to  the  unknown 
probability.  It  quickly  becomes  unusable,  and  more  efficient  alternatives  are 
sought. 

The  two  most  widely  considered  alternatives  are  those  based  on  change- 
of-measure  techniques  and  those  based  on  branching  processes.  The  former 
is  usually  called  importance  sampling ,  and  the  latter  is  often  referred  to  as 
multi-level  splitting.  While  good  results  on  a  variety  of  problem  formulations 
have  been  reported  for  both  methods,  it  is  also  true  that  both  methods  can 
produce  inaccurate  and  misleading  results.  The  design  issue  is  critical,  and 
one  can  argue  that  the  proper  theoretical  tools  for  the  design  of  importance 
sampling  and  splitting  algorithms  were  simply  not  available  for  complicated 
models  and  problem  formulations. 

Suppose  that  the  probability  of  interest  takes  the  form  p  =  P  {Z  6  G}  = 
p(G),  where  G  is  a  subset  of  some  reasonably  regular  space  (e.g.,  a  Polish 
space  S )  and  p  a  probability  measure.  In  ordinary  Monte  Carlo  one  gener¬ 
ates  a  number  of  independent  and  identically  distributed  (iid)  samples  { Z{ } 
from  p,  and  then  estimates  p  using  the  sample  mean  of  1  {Zi&G}-  I11  the 
case  of  importance  sampling,  one  uses  an  alternative  sampling  distribution 
v,  generates  iid  samples  {Zj}  from  z/,  and  then  estimates  via  the  sample 
mean  of  [dp/dv]  (Zj)lj^  gG|-  The  Radon-Nikodim  derivative  [dp/dv]  ( Z>i ) 
guarantees  that  the  estimate  is  unbiased.  The  goal  is  to  choose  v  so  that 
the  individual  samples  [dp/dv]  gG|  cluster  tightly  around  p ,  thereby 

reducing  the  variance.  However,  for  complicated  process  models  or  events  G 
the  selection  of  a  good  measure  v  may  not  be  simple.  The  papers  [9,  10]  show 
how  certain  standard  heuristic  methods  based  on  ideas  from  large  deviations 
could  produce  very  poor  results.  The  difficulty  is  due  to  points  in  S  with  low 
probability  under  v  for  which  dp/dv  is  very  large.  The  aforementioned  large 
deviation  heuristic  does  not  properly  account  for  the  contribution  of  these 
points  to  the  variance  of  the  estimate,  and  it  is  not  hard  to  find  examples 
where  the  corresponding  importance  sampling  estimator  is  much  worse  than 
even  ordinary  Monte  Carlo.  The  estimates  exhibit  very  inaccurate  and/or 
unstable  behavior,  though  the  instability  may  not  be  evident  from  numerical 
data  until  massive  amounts  have  been  generated. 

The  most  discussed  application  of  splitting  type  schemes  is  to  first  en¬ 
trance  probabilities,  and  to  continue  the  discussion  we  specialize  to  that 
case.  Thus  Z  is  the  sample  path  of  a  stationary  stochastic  process  {Xi} 
(which  for  simplicity  is  taken  to  be  Markovian),  and  G  is  the  set  of  trajec¬ 
tories  that  first  enter  a  set  B  prior  to  entering  a  set  A.  More  precisely,  for 
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disjoint  B  and  A  and  x  ^  A  U  B, 

p  =  p(x )  =  P{Xj  G  B,Xi  (£  A,i  G  {0, .  <  oo|A0  =  x }  . 

In  the  most  simple  version  of  splitting,  the  state  space  is  partitioned  accord¬ 
ing  to  certain  sets  B  C  Cq  C  C\  C  ■  •  •  C  Ck,  with  x  £  Ck  and  A  n  Ck  =  0. 
These  sets  are  often  defined  as  level  sets  of  a  particular  function  V,  which 
is  commonly  called  an  importance  function.  Particles  are  generated  and 
killed  off  according  to  the  following  rules.  A  single  particle  is  started  at 
x.  Generation  of  particles  (splitting)  occurs  whenever  an  existing  particle 
reaches  a  threshold  or  level  C*  for  the  first  time.  At  that  time,  a  (possibly 
random)  number  of  new  particles  are  placed  at  the  location  of  entrance  into 
Ci .  The  future  evolutions  of  these  particles  are  independent  of  each  other 
(and  all  other  particles),  and  follow  the  law  of  {Aj}.  Particles  are  killed  if 
they  enter  A  before  B.  Attached  to  each  particle  is  a  weight.  Whenever  a 
particle  splits  the  weight  of  each  descendent  equals  that  of  the  parent  times 
a  discount  factor.  A  random  tree  is  thereby  produced,  with  each  leaf  corre¬ 
sponding  to  a  particle  that  has  either  reached  B  or  been  killed.  A  random 
variable  (roughly  analogous  to  a  single  sample  [dfi/du]  (Zj)l  r^.eG\  from  the 
importance  sampling  approach)  is  defined  as  the  sum  of  the  weights  for  all 
particles  that  make  it  to  B.  The  rule  that  updates  the  weights  when  a  par¬ 
ticle  splits  is  chosen  so  that  the  expected  value  of  this  random  variable  is 
p.  This  numerical  experiment  is  independently  repeated  a  number  of  times, 
and  the  sample  mean  is  again  used  to  estimate  p. 

There  are  two  potential  sources  of  poor  behavior  in  the  splitting  algo¬ 
rithm.  The  first  and  most  troubling  is  that  the  number  of  particles  may  be 
large.  For  example,  the  number  could  be  comparable  0K  for  some  6  >  1.  In 
settings  where  a  large  deviation  characterization  of  p  is  available,  the  num¬ 
ber  of  levels  itself  usually  grows  with  the  large  deviation  parameter,  and  so 
the  number  of  particles  could  grow  exponentially.  We  will  refer  to  this  as 
instability  of  the  algorithm.  For  obvious  computational  reasons,  instability 
is  something  to  be  avoided.  The  other  source  of  poor  behavior  is  analogous 
to  that  of  importance  sampling  (and  ordinary  Monte  Carlo),  which  is  high 
relative  variance  of  the  estimate.  If  the  weighting  rule  leads  to  high  varia¬ 
tion  of  the  weights  of  particles  that  make  it  to  B,  or  if  too  many  simulations 
produce  no  particles  that  make  it  to  B  (in  which  case  a  zero  is  averaged  in 
the  sample  mean),  then  high  relative  variance  is  likely.  Note,  however,  that 
this  problem  has  a  bounded  potential  for  mischief,  since  the  weights  cannot 
be  larger  than  one.  Such  a  bound  does  not  hold  for  the  Radon-Nikodim 
derivative  of  importance  sampling. 
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When  the  probability  of  interest  can  be  approximated  via  large  devia¬ 
tions,  the  rate  of  decay  is  described  in  terms  of  a  variational  problem,  such 
as  a  calculus  of  variations  or  optimal  control  problem.  It  is  well  known 
that  problems  of  this  sort  are  closely  related  to  a  family  of  nonlinear  par¬ 
tial  differential  equations  (PDE)  known  as  Hamilton- Jacobi-Bellman  (HJB) 
equations.  In  a  pair  of  recent  papers  [3,  5],  it  was  shown  how  subsolutions 
of  the  HJB  equations  associated  with  a  variety  of  rare  event  problems  could 
be  used  to  construct  and  rigorously  analyze  efficient  importance  sampling 
schemes.  In  fact,  the  subsolution  property  turns  out  to  be  in  some  sense 
necessary  and  sufficient,  in  that  efficient  schemes  can  be  shown  to  imply  the 
existence  of  an  associated  subsolution. 

The  purpose  of  the  present  paper  is  to  show  that  in  certain  circumstances 
a  remarkably  similar  result  holds  for  splitting  algorithms.  More  precisely, 
we  will  show  the  following  under  relatively  mild  conditions. 

•  A  necessary  and  sufficient  condition  for  the  stability  of  the  splitting 
scheme  associated  to  a  given  importance  function  is  that  a  certain 
scalar  multiple  of  the  importance  function  be  a  subsolution  of  the 
related  HJB  equation.  The  multiplier  is  the  ratio  of  the  logarithm  of 
the  expected  number  of  offspring  for  each  split  and  the  gap  between 
the  levels. 

•  If  the  subsolution  property  is  satisfied,  then  the  variance  of  the  split¬ 
ting  scheme  decays  exponentially  with  a  rate  defined  in  terms  of  the 
value  of  the  subsolution  at  a  certain  point. 

•  As  in  the  case  of  importance  sampling,  when  a  subsolution  has  the 
maximum  possible  value  at  this  point  (which  is  the  value  of  the  cor¬ 
responding  solution),  the  scheme  is  in  some  sense  asymptotically  op¬ 
timal. 

These  results  are  significant  for  several  reasons.  The  most  obvious  is  that 
a  splitting  algorithm  is  probably  not  useful  if  it  is  not  stable,  and  the  subso¬ 
lution  property  provides  a  way  of  checking  stability.  A  second  is  that  good, 
suboptimal  schemes  can  be  constructed  and  compared  via  the  subsolutions 
framework.  A  third  reason  is  that  for  interesting  classes  of  problems  it  is 
possible  to  construct  subsolutions  that  correspond  to  asymptotically  optimal 
algorithms  (see  [3,  5]).  Subsolutions  can  be  much  easier  to  construct  than 
solutions.  In  this  context  it  is  worth  noting  that  the  type  of  subsolution 
required  for  splitting  (a  viscosity  subsolution  [1,  6])  is  less  restrictive  that 


4 


the  type  of  subsolution  required  for  importance  sampling.  Further  remarks 


on  this  point  will  be  given  in  Section  5. 

An  outline  of  the  paper  is  as  follows.  In  the  next  section  we  describe 
the  probabilities  to  be  approximated,  state  assumptions,  and  formulate  the 
splitting  algorithm.  This  section  also  presents  a  closely  related  algorithm 
that  will  be  used  in  the  analysis.  Section  3  studies  the  stability  problem, 
and  Section  4  shows  how  to  bound  the  variance  of  an  estimator  in  terms 
of  the  related  subsolution.  The  results  of  Sections  3  and  4  can  be  phrased 
directly  in  terms  of  the  solution  to  the  calculus  of  variations  problem  that 
is  related  to  the  large  deviation  asymptotics.  However,  for  the  purposes 
of  practical  construction  of  importance  functions  the  characterization  via 
subsolutions  of  a  PDE  is  more  useful.  These  issues  are  discussed  in  Section 
5,  and  examples  and  numerical  examples  are  presented  in  the  concluding 
Section  6. 

Acknowledgment.  Our  interest  in  the  parallels  between  importance  sam¬ 
pling  and  multi-level  splitting  was  stimulated  by  a  talk  given  by  P.T.  de 
Boer  at  the  RESIM  conference  in  Bamberg,  Germany  [2]. 

2  Problem  Formulation 

2.1  Problem  Setting  and  Large  Deviation  Properties 

A  domain  D  C  is  given  and  also  a  sequence  of  discrete  time,  stationary, 
Markov  D— valued  processes  {Xn}.  Disjoint  sets  A  and  B  are  given,  and 
we  set  rn  =  min{i  :  Xf  £  A  U  B}.  The  probability  of  interest  is  then 


pn(xn)  =  P{X^€B  I  A, 


Xn  }  • 


n 


0 


The  varying  initial  conditions  are  used  for  greater  generality,  but  also  be¬ 
cause  initial  conditions  for  the  prelimit  processes  may  be  restricted  to  some 
subset  of  D.  The  analogous  continuous  time  framework  can  also  be  used 
with  analogous  assumptions  and  results.  For  a  given  point  x  ^  Au  B,  we 
make  the  following  large  deviation-type  assumption. 

Condition  1  For  any  sequence  xn  — >  x, 


where  W (x)  is  the  solution  to  a  control  problem  of  the  form 
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Here  L  :  x  — >  [0,oo],  and  the  infimum  is  taken  over  all  absolutely 

continuous  functions  0  with  0(0)  =  x,  cf>(t)  G  B,  0(s)  0  A  for  all  s  G  [0,  t] 
and  some  t  <  oo. 


Remark  2  The  assumption  that  {Xn}  be  Markovian  is  not  necessary  for 
the  proofs  to  follow.  For  example,  it  could  be  the  case  that  Xf  is  the  first 
component  of  a  Markov  process  (Xf,  Yf1)  (e.g.,  so-called  Markov-modulated 
processes).  In  such  a  case  it  is  enough  that  the  analogous  large  deviation 
limit  hold  uniformly  in  all  possible  initial  conditions  YJ1,  and  indeed  the 
proofs  given  below  will  carry  over  with  only  notational  changes.  This  can 
be  further  weakened,  e.g.,  it  is  enough  that  the  estimates  hold  uniformly 
with  sufficiently  high  probability  in  the  conditioning  data.  However,  the 
construction  of  subsolutions  will  be  more  difficult,  since  the  PDE  discussed  in 
Section  5  is  no  longer  available  in  explicit  form.  See  [5]  for  further  discussion 
on  this  point. 

It  is  useful  to  say  a  few  words  on  how  one  can  verify  conditions  like  Con¬ 
dition  1  from  existing  large  deviation  results.  Similar  but  slightly  different 
assumptions  will  be  made  in  various  places  in  the  sequel,  and  in  all  cases 
analogous  remarks  will  apply. 

For  discrete  time  processes  one  often  finds  process-level  large  deviation 
properties  phrased  in  terms  of  a  continuous  time  interpolation  Xn(t),  with 
Xn{i/n )  =  Xf  and  Xn{t)  defined  by  piecewise  linear  interpolation  for  t 
not  of  the  form  t  =  i/n.  In  precise  terms,  process-level  large  deviation 
asymptotics  hold  for  {Xn}  if  the  following  upper  and  lower  bounds  hold  for 
each  T  G  (0,  oo)  and  any  sequence  of  initial  conditions  xn  G  D  with  xn  — >  x. 
Define 

=  ^  L  ds 

if  0  is  absolutely  continuous  with  0(0)  =  x,  and  /J(0)  =  oo  otherwise.  If  F 
is  any  closed  subset  of  C([0,  T]  :  D )  then  the  upper  bound 

limsup  —  logP{Xn  G  F  |Xn(0)  =  xn  }  <  —  inf  /J(0) 

n — >oo  Tl  (f)E.F 

holds,  and  if  O  is  any  open  subset  of  C([0,  T]  :  D )  then  the  lower  bound 

lim  inf  —  log  P  {Xn  eO\Xn{0)  =  xn}  >  -  inf  /J(0) 
n— >o o  n  060 
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holds.  It  is  also  usual  to  assume  that  for  each  fixed  x,  T,  and  any  M  <  oo, 
the  set 

{<t>zC{[Q,T\-.D)-.%{<j>)<M} 

is  compact  in  C([0,T]  :  D).  The  zero-cost  trajectories  (i.e. ,  paths  <fi  for 
which  ij (<j))  =  0)  are  particularly  significant  in  that  all  other  paths  are  in 
some  sense  exponentially  unlikely. 

With  regard  to  Condition  1,  two  different  types  of  additional  conditions 
beyond  the  sample  path  large  deviation  principle  are  required.  One  is  a  con¬ 
dition  that  allows  a  reduction  to  large  deviation  properties  over  a  finite  time 
interval.  For  example,  suppose  that  there  is  T  such  that  if  cf  enters  neither 
A  nor  B  before  T,  then  ij (<f)  >  W(x)  +  1.  In  this  case,  the  contribution  to 
pn{xn)  from  sample  paths  that  take  longer  than  T  is  negligible,  and  can  be 
ignored.  This  allows  an  application  of  the  finite  time  large  deviation  prin¬ 
ciple.  Now  let  G  be  the  set  of  trajectories  that  enter  B  at  some  time  t  <  T 
without  having  previously  entered  A.  By  the  first  condition,  the  asymptotic 
rates  of  decay  of  pn(xn )  and  P  { Xn  G  G  |Xn(0)  =  xn}  are  the  same.  The 
second  type  of  condition  is  to  impose  enough  regularity  on  the  sets  A  and  B 
and  the  rate  function  ij  ((f)  that  the  infimum  over  the  interior  and  closure 
of  G  are  the  same.  These  points  are  discussed  at  length  in  the  literature  on 
large  deviations  [7] . 

Example  3  Assume  the  following  conditions:  L(-,  •)  is  lower  semicontinu- 
ous;  for  each  x  G  D,  L(x,-)  is  convex;  L(x,  •)  is  uniformly  superlinear;  for 
each  x  G  D  there  is  a  unique  point  b(x)  for  which  L(x,  b(x))  =  0;  b  is  Lips- 
chitz  continuous,  and  all  solutions  to  (f  =  b(cf )  are  attracted  to  6  G  A,  with  A 
open.  Let  V  C  D  be  a  bounded  domain  that  contains  A  and  B,  and  assume 
(b(x),n(x))  <  0  for  x  G  dV,  where  n(x)  is  the  outward  normal  to  V  at  x. 
Suppose  that  the  cost  to  go  from  x  to  any  point  in  dV  is  at  least  W(x)  +  1. 
Then  T  as  described  above  exists. 

2.2  The  Splitting  Algorithm 

In  order  to  define  a  spitting  algorithm  we  need  to  choose  an  importance 
function  V(y)  and  a  level  size  A  >  0.  We  will  require  that  V(y)  be  continu¬ 
ous  and  that  V(y)  <  0  for  all  y  G  B.  Later  on  we  will  relate  V  to  the  value 
function  W,  and  discuss  why  subsolutions  to  the  PDE  that  is  satisfied  by 
W  are  closely  related  to  natural  candidates  for  the  importance  function. 

To  simplify  the  presentation,  we  consider  only  splitting  mechanisms  with 
an  a  priori  bound  R  <  oo  on  the  maximum  number  of  offspring.  The 
restriction  is  convenient  for  the  analysis,  and  as  we  will  see  is  without  loss  of 
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generality.  The  set  of  (deterministic)  splitting  mechanisms  will  be  indexed 
by  j  E  {1, . . . ,  J}.  Given  that  mechanism  j  has  been  selected,  r(j)  particles 
(with  |r(j)|  <  R )  are  generated  and  weights  w(j)  E  are  assigned  to 

the  particles.  Note  that  we  do  not  assume  Yll=i  wi(j)  =  1-  The  class  of  all 
splitting  mechanisms  (i.e.,  including  randomized  mechanisms)  is  identified 
with  the  set  of  all  probability  distributions  on  {1, . . . ,  J}. 

Associated  with  V  are  the  level  sets 


Lz  =  {y  €  D  :  V(y)  <  z}. 

A  key  technical  condition  we  use  is  the  following.  In  the  condition,  Ex 
denotes  expected  value  given  Xft  =  x. 


Condition  4  Let  z  E  [0,  V{x)\  be  given  and  define  an  =  min  {i  :  Xfi  E  A  U  Lzj. 
Then 


lim  inf - log  Ex 

n—>  oo  77, 


1  {x^lz}  (Pnmr\  >  W[x)  +^inf  ^  W{y). 


Under  the  conditions  discussed  after  Condition  1  which  allow  one  to  con¬ 
sider  bounded  time  intervals,  Condition  4  follows  from  the  Markov  property 
and  the  large  deviation  upper  bound. 

We  also  define  collections  of  sets  jUg1  =  B,  C™  =  T(j-i)A/n-  j  =  1,  •  ■  .j. 
Define  the  level  function  ln  by  ln(y)  =  min{j  >  0  :  y  E  C”}.  The  location  of 
the  starting  point  corresponds  to  ln(x )  =  \riV(x)/ A],  and  ln  =  0  indicates 
entry  into  the  target  set  B.  The  splitting  algorithm  associated  with  a  partic¬ 
ular  distribution  q  will  now  be  defined.  Although  the  algorithm  depends  on 
V,  q,  r,  w,  xn,  A,  A  and  B,  to  minimize  notational  clutter  these  dependencies 
are  not  explicitly  denoted. 


Splitting  Algorithm  (SA) 


Variables : 

Nfi  number  of  particles  in  generation  r 
X™k  position  of  kth  particle  in  generation  r 
w™k  weight  of  kth  particle  in  generation  r 
Initialization  Step: 

Rtq  =  1 .  XI,  =xn,  <;1  =  1 
for  r  =  1, . . . ,ln(xn ) 

Nfi  =  0 

end 
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Main  Algorithm: 

for  r  =  1, . .  .,ln(xn) 

if  AT«_1  =  0  then  N?  =  0 
else 

for  j  =  l,...,iVrn_1 

generate  i  a  single  sample  of  a  process  with  the 
same  law  as  X™  and  initial  condition  Z™j0  =  X™_1j 

let 

Splitting  Step  begin 

if  Z”  •  „  ^  A 

r,j 

let  M  be  an  independent  sample  from  the  law  q 

for  k  =  1, . . . ,  \r(M)\ 

N?  =  N?  +  1 

vn  _  yn 

end 

end 

Splitting  Step  end 


end 


end 

end 

Construction  of  a  sample: 

once  all  the  generations  have  been  calculated  we  form 
the  quantity 


N, 


’SA 


El  11 1 
7=1 


lnM  „.,n 


W 


ln{xn),j  ‘ 


An  estimate  PgA(xn)  of  pn(xn )  is  formed  by  averaging  a  number  of  inde¬ 
pendent  samples  of  SgA.  Observe  that  once  generation  r  has  been  calculated 
the  information  about  generations  0  to  j —  1  can  be  discarded,  and  so  there 
is  no  need  to  keep  all  the  data  in  memory  until  completion  of  the  algorithm. 
Also  note  that  in  practice  there  is  no  need  to  split  upon  entering  CJ}  =  B. 


We  first  need  to  find  conditions  under  which  this  splitting  algorithm  gives 
an  unbiased  estimator  of  pn(xn).  To  simplify  this  and  other  calculations  we 
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Figure  1:  The  Sets  A  and  B  and  Level  Sets  of  V. 


introduce  an  auxiliary  algorithm  titled  Splitting  Algorithm  Fully  Branching 
(SFB).  The  essential  difference  between  the  two  is  that  the  process  dynamics 
are  redefined  in  A  to  make  it  absorbing,  and  that  splitting  continues  even 
after  a  particle  enters  A.  When  the  estimate  is  constructed  we  only  count  the 
particles  which  are  in  B  in  the  last  generation,  so  that  the  two  estimates  have 
the  same  distribution.  The  SFB  algorithm  is  more  convenient  for  purposes 
of  analysis,  because  we  do  not  distinguish  those  particles  which  have  entered 
A  from  those  which  still  have  a  chance  to  enter  B.  Of  course  this  algorithm 
would  be  terrible  from  a  practical  perspective-the  total  number  of  particles 
is  certain  to  grow  exponentially.  However,  the  algorithm  is  used  only  for 
the  purposes  of  theoretical  analysis,  and  the  number  of  particles  is  not  a 
concern.  Overbars  are  used  to  distinguish  this  algorithm  from  the  previous 
one. 

Splitting  Algorithm  Fully  Branching  (SFB) 

Variables : 

iV”  number  of  particles  in  generation  r 
X™k  position  of  kth  particle  in  generation  r 
w™k  weight  of  kth  particle  in  generation  r 
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Initialization  Step: 

N$  =  l,  XI,  =xn,  <,  =  1 

for  r  =  1, . .  .,ln(xn) 

N?  =  0 

end 

Main  Algorithm: 

for  r  =1, . . . ,ln{xn ) 

if  N?_x  =  0  then  iVrn  =  0 
else 

for  j  =  1, . .  .,A7rn_i 

generate  a  single  sample  of  a  process  with  the 

same  law  as  X™  and  initial  condition  Z™j0  =  X™_1j 

iet  =  min{z  :  6iU  C£(Xn)_r} 

Splitting  Step  begin 

let  M  be  an  independent  sample  from  the  law  q 

for  k  =  1, . . . ,  |r(M)| 

Xrn  =  JVrn_+  1 

vn  _  yn 

r,N ?  ~ 

=  Wk(M)w?_lij 

end 

Splitting  Step  end 


end 


end 

end 

Construction  of  a  sample: 

once  all  the  generations  have  been  calculated  we  form 
the  quantity 


.A', 


cn  L  \xn)  i  0j. .n 

^SFB-  Y  =  1  L{xrnM.ZB} 

Since  the  distributions  of  the  two  estimates  coincide 


IT 

l 

IT 

3 

_ 1 

Ex 

J=1 

=  Ex 

•Ln 

Because  of  this,  the  SFB  algorithm  can  be  used  to  prove  the  following. 
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Lemma  5  An  estimator  based  on  independent  copies  of  SgA 
and  only  if 


E 


r{M) 

E  MM) 

i= 1 


=  l. 


is  unbiased  if 


Proof.  It  suffices  to  prove 


E,r 


N, 


ln(xn) 

£  1 

3= 1 


=  pn(xn). 


We  will  use  a  particular  construction  of  the  SFB  algorithm  that  is  useful 
here  and  elsewhere  in  the  paper.  Recall  that  with  this  algorithm  every 
particle  splits  at  every  generation.  Hence  the  random  number  of  particles 
associated  with  each  splitting  can  be  generated  prior  to  the  generation  of 
any  trajectories  that  will  determine  particle  locations.  As  a  consequence,  the 
total  number  of  particles  present  at  the  last  generation  can  be  calculated,  as 
can  the  weight  that  will  be  assigned  to  each  particle  in  this  final  generation, 
prior  to  the  assignment  of  a  trajectory  to  the  particle.  Once  the  weights 
have  been  assigned,  the  trajectories  of  all  the  particles  can  be  constructed 
in  terms  of  random  variables  that  are  independent  of  those  used  to  construct 
the  weights.  Since  the  probability  that  any  such  trajectory  makes  it  to  B 
prior  to  hitting  A  is  pn(xn), 


Nn 

Iyin(xn) 

Nn 

iyinM 

Ex 

•Ln 

—  P  ( xn)EXn 

- 1 

'3 

_ 1 

A  simple  proof  by  induction  and  the  independence  of  the  splitting  from 
particle  to  particle  shows  that 


( 

'r(M) 

Ex 

E 

=  \E 

E  w^M) 

3= 1 

\ 

i= 1 

in{xn) 


For  the  rest  of  the  paper  we  restrict  attention  to  splitting  mechanisms 
that  are  unbiased. 
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3  Stability 


Now  let  an  importance  function  V,  level  A,  and  splitting  mechanism  ( q ,  r,  w) 
be  given.  Define 

J(x,y)=  inf  f  L  U(s),  fi(s))  ds,  (1) 

<t>,t:<t>(0 )=x,<j>(t)=y  J0  V  J 

where  the  infinrunr  is  over  absolutely  continuous  functions.  A  function  W  : 
D  — >  M  will  be  called  a  subsolution  if  W(x)  <  0  for  all  x  E  B  and  if 
W(x)  —  W(y)  <  J{x,y)  for  all  x,y  G  D\(AuB).  In  Section  5  we  will 
discuss  conditions  under  which  W  can  be  identified  as  a  viscosity  subsolution 
for  an  associated  PDE.  Recall  that  a  splitting  algorithm  is  called  stable  if 
the  total  number  of  particles  ever  used  grows  subexponentially  as  n  — >  oo. 
For  a  given  splitting  algorithm  define 

W(x)  =  l0g^(MV(x).  (2) 

In  this  section  we  show  that,  loosely  speaking,  a  splitting  algorithm  is  stable 
if  and  only  if  IF  is  a  subsolution. 

A  construction  that  will  simplify  some  of  the  proofs  is  to  replace  a 
given  splitting  mechanism  by  one  for  which  all  the  weights  are  constant. 
Thus  ( q,r,w )  is  replaced  by  ( q,r,w ),  where  for  each  j  =  1  ,...,</  and 

*  =  b  •  ■•,»■( j ), 

J 

[wiij)}^1  =  Er{M)  =  Y,rti)qj- 

3= 1 

The  new  splitting  mechanism  is  also  unbiased,  and  the  distribution  of  the 
number  of  particles  at  each  stage  is  the  same  as  that  of  (q,  r,  vj). 

To  establish  the  instability  we  make  a  very  mild  assumption  on  a  large 
deviation  lower  bound  for  the  probability  that  an  open  ball  is  hit  prior  to 
reaching  A.  This  assumption  can  be  expected  to  hold  under  conditions 
which  guarantee  Condition  1. 

Proposition  6  Consider  an  importance  function  V,  level  A,  and  split¬ 
ting  mechanism  ( q,r,w ),  and  define  W  by  (2).  Suppose  there  exists  y  S 
D\  ( A  U  B)  such  that  W{y)  >0  and 

W(x)-W(y)>J(x,y).  (3) 
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Assume  that  J{x,y)  is  continuous  at  y.  Letpn(xn)  be  the  probability  that 
Xn  enters  the  ball  of  radius  5  >  0  about  y  before  entering  A,  given  Xfi  =  xn, 
and  assume 

liminf  —  logpn(xn)  >  —  inf  J(x,z). 

n-^roo  n  z:\z—y\<6 

Then  the  corresponding  splitting  algorithm  is  not  stable. 

Proof.  It  is  enough  to  prove  the  instability  of  the  algorithm  that  uses 
( q,r,w ).  Since  J(x,y)  >  0,  V(y)  <  V(x).  From  the  definition  of  W  in  (2) 
and  (3)  there  exist  5  >  0  and  e  >  0  such  that  for  all  z  with  \y  —  z\  <  5, 

\v(x)  -V{z)]  l0gi^(M)  >  J(x,  z)  +  £. 

Let  S  =  {z  :  \y  —  zj  <  <5}.  By  taking  5  >  0  smaller  if  necessary  we  can 
guarantee  that  S  n  A  =  0  and  V(z)  >0  for  all  z  G  S. 


Figure  2:  Level  Sets  of  V  in  Proof  of  Instability. 


Suppose  one  were  to  consider  the  problem  of  estimating  pn(xn) .  One 
could  use  the  same  splitting  mechanism  and  level  sets,  and  even  the  same 
random  variables,  except  that  one  would  stop  on  hitting  A  or  S  rather 
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than  A  ot  B,  and  the  last  stage  would  correspond  to  some  number  mn  < 
ln(xn).  Of  course,  since  V  is  positive  on  S  it  will  no  longer  work  as  an 
importance  function,  but  there  is  a  function  V  =  V  —  a  that  will  induce 
exactly  the  same  level  sets  as  V  and  which  can  serve  as  an  importance 
function  for  this  problem.  See  Figure  2.  The  two  problems  can  be  coupled, 
in  that  exactly  the  same  random  variables  can  be  used  to  construct  both 
the  splitting  mechanisms  and  particle  trajectories  up  until  the  particles  in 
the  pn(xn)  problem  enter  Of. 

If  a  particular  particle  has  not  been  trapped  in  A  prior  to  entering  Of, 
then  that  particle  would  also  not  yet  be  trapped  in  A  in  the  corresponding 
scheme  used  to  estimate  pn(xn).  Note  also  that  the  number  of  particles 
that  make  it  to  S  are  at  most  R  times  the  number  that  make  it  to  Of. 
Let  N^n  denote  the  number  of  particles  that  make  it  to  S  in  the  SA  used 
to  approximate  pn(xn),  and  let  AT^,  ^  be  the  number  used  in  the  SA  that 

approximates  pn(xn).  Then  AT™,  ^  >  N^n/R. 

Using  the  SFB  variant  in  the  same  way  that  it  was  used  in  the  proof  of 
Lemma  5  and  that  the  mechanism  (q,  r,  w)  is  used, 


[Er(M)]-mn  Ex  AT 


m 

mn 


We  now  use  the  lower  bound  on  pn(xn)  and  that  mn/n  —>  [V (x)  —  sup2gS  V (z)\  / A: 


lim  inf  —  log  Ex 

n— >o o  77,  L 

=  lim  inf  -  log  EXn  \pn(xn )  [. Er(M)]m" 


z&S  [  A 

e. 


It  follows  that 


lim  inf  —  log  EXn 


Nr(xn)  -  e  >  0 
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which  completes  the  proof.  ■ 

The  next  proposition  considers  stability.  Here  we  will  make  a  mild  as¬ 
sumption  concerning  a  large  deviation  upper  bound,  which  can  also  be  ex¬ 
pected  to  hold  under  conditions  which  guarantee  Condition  1. 

Proposition  7  Consider  an  importance  function  V,  level  A,  and  splitting 
mechanism  ( q,r,w ),  and  define  W  by  (2).  Suppose  that 

W(x)-W(y)<J(x,y) 

for  all  x,  y  E  D\  ( A  U  B)  and  that  W(y)  <  0  for  all  y  E  B.  Consider  any 
a  E  (0,  P(x)/A],  let  pn(xn)  be  the  probability  that  Xn  enters  level  set  La 
before  entering  A  (given  X(f  =  xn),  and  assume 

limsup  —  logpn(xn)  <  —  inf  J{x,z). 
n — >oo  U.  z£La 

Then  the  corresponding  splitting  algorithm  is  stable. 

Proof.  For  each  n  let  rn  be  the  value  in  {1, . . . ,  ln(x)}  that  maximizes  r  — * 
Ex  [IV”].  Since  rn/n  is  bounded,  along  some  subsequence  (again  denoted 
by  n)  we  have  rn/n  — >  v  E  [0,  U(x)/A].  Using  the  usual  argument  by 
contradiction,  it  is  enough  to  prove 

lirri  sup  —  log  EXn  [Nfin]  <  0 

n— >oo  Tl 

along  this  subsequence.  First  suppose  that  v  =  0.  Given  5  >  0,  choose 
h  <  oo  such  that  rn/n  <  6  for  all  n  >  h.  Then  NfnXri  <  RSn,  and  so 
limsup^oo  ^  log  EXn  [NfiuXri]  <  5  ■  logf?.  Since  6  >  0  is  arbitrary,  this  case 
is  complete. 


Now  assume  v  E  (0,  V(x)/A]  and  let  5  E  (0,  v)  be  given.  Suppose  one 
were  to  consider  the  problem  of  estimating  pn(xn )  as  defined  in  the  statement 
of  the  proposition,  with  a  =  v.  We  again  use  the  same  splitting  mechanism 
and  level  sets,  except  that  we  now  stop  on  hitting  A  or  Lv+$.  An  impor¬ 
tance  function  with  these  level  sets  can  be  found  by  adding  a  constant  to 
V.  We  again  couple  the  processes,  and  observe  that  entry  into  C”  for  the 
pn(xn )  problem  corresponds  to  entry  into  C,”n  in  the  pn(xn)  problem,  where 
mn/n  — >  (V(x)  —  v  —  5)/ A  as  n  — >  oo.  Observe  that  every  particle  in  the 
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Figure  3:  Level  Sets  of  V  in  Proof  of  Stability. 


algorithm  used  to  estimate  pn(xn)  that  is  not  trapped  in  A  by  stage  mn  is 
also  not  trapped  in  A  in  the  algorithm  used  to  estimate  pn{xn).  Hence  the 
number  of  such  particles  can  serve  as  an  upper  bound  on  the  number  used 
to  construct  SgA.  Let  denote  the  number  of  such  particles  for  the  SA 
used  to  estimate  pn(xn). 

We  again  use  the  SFB  variant  in  the  same  way  that  it  was  used  in  the 
proof  of  Lemma  5  and  the  (q,  r,  w)  splitting  mechanism  to  obtain 


Pn{xn ) 


[Er(M)]~mn  Ex. 


Using  the  upper  bound  on  pn{xn )  and  that  mn/n  —>  \V(x)  —  v  —  5]  /A: 


lim  sup  —  log  Ex 

n— KX)  Tl 


Nr‘ 


lim  sup  —  log  Ex 

n— >oo  Tl 


pn(xn )  [Er(M)]r 


<  -  inf  J(x,  z)  +  - —log  [Er{M)] 

Z£.LvA-§  ZX 


sup 


[V(x)  V^(-s)] 

A 


log  [Er{M)\  -  J(x,z ) 


<  0. 


For  sufficiently  large  n  we  have  rn  —  mn  <  2<5n/A,  and  hence  N™n  <  N. £ 
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R25n/A.  It  follows  that 


lim  sup  —  log  EXn  [N™n\  <  (25/ A)  -  log  R, 


and  since  5  >  0  is  arbitrary  the  proof  is  complete.  ■ 

4  Asymptotic  Performance 

Since  the  sample  SgA  has  mean  pn(xn),  any  estimator  constructed  as  an 
average  of  independent  copies  of  SgA  is  unbiased  and  has  variance  propor¬ 
tional  to  var^  [sgA],  Once  the  mean  is  fixed,  the  minimization  of  varXn  [sg^] 
among  splitting  algorithms  is  equivalent  to  the  minimization  of  EXn  [sgA]  . 
It  is  of  course  very  difficult  to  find  the  minimizer  in  this  problem.  When  a 
large  deviation  scaling  holds,  a  useful  alternative  is  to  maximize  the  rate  of 
decay  of  the  second  moment,  i.e. ,  to  maximize 


O 


By  Jensen’s  inequality  the  best  possible  rate  is  2 W{x): 


The  main  result  of  this  section  is  the  following. 

Theorem  8  Consider  an  importance  function  V,  level  A,  and  splitting 
mechanism  ( q,r,w ),  and  define  W  by  (2).  Suppose  that 


W(x)-W(y)<J(x,y) 


for  all  1,1/  £  D\  {A  U  B )  and  that  W(y)  <  0  for  all  y  E  B.  Assume  also 
that  Conditions  1  and  4  hold.  Then 
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Proof.  It  is  sufficient  to  consider  the  SFB  algorithm  and  prove  that 


N, 


1  2 


lim - log  Ex 

n— >oo  n 


=  W{x)  -  V(x ) 


in(xn) 

E  i 

3= 


\  L{^n(xnhj^}Wh-n),3 

log  (eYh=P  Mm)2 


A 


The  proof  is  broken  into  upper  and  lower  bounds. 
We  first  prove 


lim  sup - log  Ex 

n — »oo  Tl 


<  W{x)  -  V{x) 


n; 


1  2 


.£  B^Wln(xn),. 


ln(.Xn) 

E  i{ 

7=1  1 


log  (e'E,1£{)  Wi(M)2^ 


(4) 


In  the  following  display  we  drop  cross  terms  to  obtain  the  inequality,  and 
then  use  the  same  construction  as  in  Lemma  5  under  which  the  weights  and 
trajectories  are  independent  to  obtain  the  equality. 


lim  sup - log  Ex 

n— >oo  H 


N, 


2 


ln(.Xn) 


<  lim  sup - log  E, 

n— >oo  Tl 


§  1{^ 
1 


ln(xn) 

E  t 

3= 1 


=  lim  sup - log 

n— kx)  Tl 


( 

) 

P  ( %n)EXn 

V 

3=1 

) 

Suppose  we  prove  that  for  any  n  (and  in  particular  k,  =  ln(xn)),  that 


E,r 


NX 

E 

3= 1 


r(M) 

E  J2  MM)2 

i= 1 


(5) 


Since  ln{xn)  =  [nl/(xn)/A],  (4)  will  follow  from  Condition  1.  The  proof  of 
(5)  is  by  induction.  Let  Mj  denote  the  independent  random  variables  used 
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to  define  the  splitting  for  the  jth  particle  at  stage  k.  Then 


'^+i 

r(Mj) 

Ex 

E  K+ij)2 

=  Ex 

•Ln 

E 

. J=1 

_j=i 

2=1 

=  Er 


AT 

E 

3= 1 


«,Y 


r(M) 

E  Y, 


2—1 


/  r(M)  \K+1 

IeYMm)2  1 


We  now  turn  to  the  proof  of  the  lower  bound 


lim  inf - log  Ex 

n— >oo  Tl 


N, 


-I  2 


inM 

E  t 

3= 1 


ln  (*n  )  ,3 


>  W(x)-V(x)- 


i  ( 

log  (eYJ^P  Wi(M)2 


eB^Wln(xn),j 


(6) 


For  each  stage  k,  let  M”  •  denote  the  independent  random  variables  used  in 
the  splitting  of  particle  j  G  {l, . . iV"}.  Also,  let  7™  ■  denote  the  disjoint 
decomposition  of  the  particles  in  { 1 , . . .  ,-ZV”+1}  according  to  their  parent 
particle.  Observe  that  if  k,  l  G  7”  A;  /  l,  then  for  all  particles  descended 
from  A;  and  l,  k  is  the  time  of  their  last  common  ancestor.  Given  k  G  I™j, 
let  7”+1  j  fe  denote  the  descendants  of  this  particle  at  stage  ln(xn).  With 
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this  notation  we  can  write 


2 


k=1  i=l  k,lel™  -,k^lmkerr 


ln(xn),mk 


£B^Wlnixn),mk 


K+l,ln  ( xn),k 


.  ^  Vi 

n  v- 


ln(xn),mi 


2 


Let  «;”  ■  denote  the  products  of  the  weights  accumulated  by  particle 
j  G  {l, . . . ,  -/V"}  up  to  stage  re,  and  let  w™+1  ln,  ^  m  denote  the  product  of 

the  weights  accumulated  by  particle  m  G  |l, . . N^x  ^  j  between  stages 
re  +  1  and  the  final  stage.  Finally,  let  T%  denote  the  sigma  algebra  generated 
by  M™j,s  G  {1 j  G  {l, . . . ,  IV]1}  and  the  random  variables  used 
to  construct  X™j  f°r  these  same  indices.  Note  that  the  future  weights  are 
independent  of  TV.  and  that  the  distribution  of  XV,  ■>  depends  on  TV; 
only  through  X^j  if  &  £  I™,j  and  m  £  We  introduce  the 


notation 
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By  conditioning  on  EJ}  we  get 

Nn 


EE  E 

7= 1  mkel 


'{* 


ln(xn),mk 


,mk 


(7) 


K-\-l,ln  (xn),k 


mi&I 


re+l,Zn(a;n),fc 


ln(xn),mi 


=  EL 


=  EL 


ivr 


EFh  E 


7= i  kjei^^i 

'  m 


E  ^7  E  [Z%tk]  Ex*.  [Z2tl] 

7= i  k,iei?.,k& 


Using  again  the  independence  of  the  weights  and  trajectories  as  used  in  the 
proof  of  Lemma  5,  we  have 


£.v;,KJ  =p”(LU- 


Since 


W  =  E^wk{M)wi{M)  =  E 

k^l 

the  final  expression  in  (7)  equals 


E^(M) 


L  fc 


-  E 


E^(M): 


L  fc 


W£t 


AT" 


E1?/  (Us)' 

7=1 


We  conclude  that 


-Eh 


JV, 


n  2 


ira(*n) 

E  i{ 

7=1  1 


A"  %  e-B 


w  e  E* 

K=1 


n r 


EV  (AU: 

7=i 


EEX 
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For  a  final  time  we  use  that  the  weights  and  trajectories  are  independent, 
and  also  (5),  to  argue  that  for  any  bounded  and  measurable  function  F  and 
any  stage  k, 


'AT 

'NX 

Ef(*«"j)  K42 

=  EXn[F(X^)\EXn 

E  «42 

_i=1 

_j=l 

=  E 


r(M) 

E  ^(M)s 

2= 1 


EXn  [F  (X^)] 


Thus 


Ex 


n; 


-i  2 


E  ,  JeB\Wln&n)>3 


3  = 

Z™(a:n)-1  /  r(M) 


Av— 1 


2=1 


r(M) 

+  [^E 

2=1 


ln(x„) 


Err 


{X"n(.oon),l^A}  _ 


Since  ln(xn )  is  proportional  to  n,  to  prove  (6)  it  is  enough  to  show  that 
if  is  any  sequence  such  that  nn/n  — >  v  £  [0,  F(x)/A],  then 


lim  inf - log  _EX 

n—>oo  fi 


r(M) 


2=1 


I"'-'  a'.i  A/r  j 


>  fy(x)-i/(x)- 


A 


Observe  that  {X™nl 
lim  inf - log  Ex 

n— kx)  Ti 


i  A}  implies  XlnA  e  C 


pnV(*)/Al-«„-  By  Condition  4, 
>fF(x)  +  inf  W(y). 

V^^^JV^x)—vA 


By  the  subsolution  property,  W(y)  >  V(y)  log  Er(M)/ A.  Since  Holder’s 
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inequality  gives  —  log  (e  YH=i'>  wi{M)2')  <  log  (. Er(M )), 


lim  inf - log  Ex 

n—>oo  Tl 


r(M) 


E  Y.  »f(«)2  [l (K„xY 


i= 1 


r(M) 


>  — ulog  [  E  Wi(M)2  I  +W{x)+  inf  log^? 

'  1  y£dLv(x)_vA 


2= 1 
r(M) 


A 


-ulog  Wi(M)2j  +  W(x)  +  log  Er(M) 

log  (eYI^P  Wi{M)2 


—  v 


>  W(x)~  V{x)- 


and  the  proof  is  complete. 


4.1  Design  of  a  Splitting  Algorithm 

Suppose  that  V (x)  and  A  are  given  and  that  we  choose  a  splitting  mechanism 
(q,  r,  w)  which  is  unbiased  and  stable.  By  Theorem  8  the  asymptotic  rate  of 
decay  of  the  second  moment  is  given  by 


W(x)  -  V{x) 


log  (EYJ^P  Wi{M)2^j 


A 


By  Holder’s  inequality 


r(M)  r(M) 

E  Y  Wi(M)2  ■  Er(M)  >  E  E  =  r 
2—1  2=1 


and  therefore 


r(M) 

log  (  E  Y  wi(MY 
2=1 


<  log  ( Er(M )) . 


Equality  holds  if  and  only  if  Wi{m)  =  Er^M^  for  all  i  G  {1, . . . ,  r{m)}  and 
all  m  G  J}.  Given  the  value  u  =  Er(M),  an  alternative  splitting 
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mechanism  which  is  arguably  the  simplest  which  preserves  the  value  and 
achieves  the  equality  in  Holder’s  inequality  is  that  defined  by  J  =  2  and 

qi  =  \u\  -  u,  q2  =  1  -  qi,  r(l)  =  [u\  ,r(l)  =  |uj  +  1,  m(j)  =  l/«  all  i,j. 

(8) 

Given  a  subsolution  W,  the  design  problem  and  the  performance  of  the 
resulting  algorithm  can  be  summarized  as  follows. 

•  Choose  a  level  A  and  mean  number  of  particles  u,  and  define  an  im¬ 
portance  function  V  by  logu  •  V(x)/ A  =  W(x).  Define  the  splitting 
mechanism  by  (8).  The  resulting  splitting  algorithm  will  be  stable. 

•  If  SgA  is  a  single  sample  constructed  according  to  this  algorithm,  then 
we  have  the  asymptotic  performance 

lim  — ■  log  Ex  [(sgA)^]  =  W(x)  +  W(x). 

n— >oo  ti 

•  The  largest  possible  subsolution  satisfies  W(x)  =  W{x),  in  which  case 
we  achieve  asymptotically  optimal  performance. 


Remark  9  Although  the  subsolution  property  guarantees  stability,  it  could 
allow  for  polynomial  growth  of  the  number  of  particles.  If  in  practice  one 
observes  that  a  large  number  of  particles  make  it  to  B  in  the  course  of 
simulating  a  single  sample  then  one  can  consider  reducing  the  value  of 
A  slightly,  while  keeping  the  mechanism  and  V  fixed.  This  will  increase  the 
second  moment  of  the  estimator  slightly,  but  will  also  lead  to  an  algorithm 
that  requires  little  effort  for  each  single  sample. 

5  The  Associated  Hamilton- Jacobi-Bellman  Equa¬ 
tion 

The  probability  pn(x)  is  intimately  and  naturally  related,  via  the  exponential 
rate  W(x),  with  a  certain  nonlinear  PDE.  This  relation  is  well  known,  and 
follows  from  the  fact  that  W  is  characterized  in  terms  of  an  optimal  control 
or  calculus  of  variations  problem.  We  begin  this  section  by  defining  the  PDE 
and  the  notion  of  a  subsolution  in  the  PDE  context. 

Our  interest  in  this  characterization  is  because  it  is  more  convenient  for 
the  explicit  construction  of  subsolutions  than  the  one  based  on  the  calculus 
of  variations  problem.  See,  for  example,  the  subsolutions  constructed  for 
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various  large  deviation  problems  in  [5]  and  [3] .  (It  should  be  noted  that  the 
constructions  in  these  papers  ultimately  produce  classical  subsolutions.  In 
contrast,  the  splitting  algorithms  require  only  the  weaker  viscosity  subso¬ 
lution  property.  However,  the  smoother  subsolutions  constructed  in  [3,  5] 
are  obtained  in  as  mollified  versions  of  viscosity  subsolutions,  and  it  is  the 
construction  of  these  unmollified  functions  that  is  relevant  to  the  present 
paper.)  Other  examples  will  be  given  in  the  next  section.  Since  our  only 
interest  in  the  PDE  is  as  a  tool  for  explicit  constructions,  we  describe  the 
characterization  formally  and  in  the  simplest  possible  setting,  and  refer  the 
reader  to  [1,  6]. 

For  q  E  Md,  let 


H(x,  q)  =  inf  [(<?,  (3)  +  L(x,  (3)}  . 

/3e 

Then  under  regularity  conditions  on  L  and  the  sets  A  and  B,  W  can  be 
characterized  as  the  maximal  viscosity  subsolution  to 

H(x,  DW(x))  =  0,^iUB,  W(x)  = 

A  continuous  function  IT  is  a  viscosity  subsolution  to  this  equation  and 
boundary  conditions  if  W (x)  <  0  for  x  E  dB,  W (x)  <  oo  for  x  E  dA  and 
if  the  following  condition  holds.  If  4>  '■  K. d  — >  M  is  a  smooth  test  function 
such  that  the  mapping  x  —>  [W'(x)  —  4>(x)]  attains  a  maximum  at  xo  E 
Md\  (iUB),  then  H(xo,  D<f>(x  o))  >  0. 

Note  that  H(x,  •)  is  concave  for  each  xo  E  Mrf,  and  hence  the  pointwise 
minimum  of  a  collection  of  subsolutions  is  again  a  subsolution.  It  is  this 
observation  which  makes  the  explicit  construction  of  subsolutions  feasible  in 
a  number  of  interesting  problems  (see  [5] ) . 

In  Section  2  we  defined  W(x)  to  be  a  subsolution  to  the  calculus  of 
variations  problem  if  it  satisfied  the  boundary  inequalities  and 

W(x)-W(y)<J(x,y) 

for  all  x,  y  E  Mrf\  (Au  B).  We  now  give  the  elementary  proof  that  these 
notions  coincide.  Let  W(x)  —  (f>(x)  attain  a  maximum  at  xo-  Thus  for  any 
(3  E  and  all  a  E  (0, 1)  sufficiently  small,  TT(xo  +  a/3 )  —  4>(xo  +  a/3)  < 
W(xo)  —  (j>{x o),  and  so 

</>(x0)  -  0(xo  +  a(3)  <  W (x0)  -W(x0  +  a/3) 

<  J(x0,x0  +  a/3). 


(  0  x  E  dB 
\  oo  x  E  dA 
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Since  J(x,y)  is  defined  as  an  infimum  over  all  trajectories  that  connect  x 
to  y ,  we  always  have 

J(xq,  x0  +  a/3)  <  /  L(x0  +  s/3,  f3)ds. 

Jo 

Hence  if,  e.g.,  the  mapping  x  — >  L(x,  (3)  is  continuous,  then  for  all  f3 

4>{xq)  -  4>(x o  +  a/3 )  <  L(xq,  (3)a  +  o(o). 

Using  Taylor’s  Theorem  to  expand  </>,  sending  a  |  0  and  then  infimizing  over 
/ 3  gives 

0  <  inf  [(D<j)(xo),0)  +  L(x0,  /?)]  <  H(a;,  D<j>(x 0)). 

/3e  Rd 

Thus  W  is  a  subsolution. 

The  calculation  just  given  does  not  show  that  W  is  the  maximal  viscosity 
subsolution,  or  even  that  it  is  always  safe  to  use  a  viscosity  subsolution  to 
the  PDE  in  the  design  of  a  splitting  scheme.  The  characterization  of  W  as 
the  maximal  viscosity  subsolution  requires  that  we  establish  W (x)  <  W (x) 
whenever  W  is  a  viscosity  subsolution.  A  standard  approach  to  this  would 
be  to  show  that  given  a  viscosity  subsolution  W,  any  point  x  ^  A  U  B, 
and  any  e  >  0,  there  exists  a  smooth  classical  subsolution  W£  such  that 
We(x)  >  W(x)  —  e.  When  this  is  true  the  classical  verification  argu¬ 
ment  [6]  can  be  sued  to  show  W(x)  >  W£(x),  and  since  e  >  0  is  arbitrary 
W(x)  >  W(x).  This  brings  us  very  close  to  the  method  of  constructing 
nearly  optimal  importance  sampling  schemes  as  described  in  [3,  5],  where 
the  design  of  the  scheme  must  be  based  on  the  smooth  classical  subsolution 
W£(x)  rather  than  W(x).  In  all  the  examples  of  the  next  subsection  the  in¬ 
equality  W (x)  <  W (x)  can  be  established  by  constructing  a  nearby  smooth 
subsolution  as  in  [3,  5]. 

6  Numerical  Examples 

In  this  section  we  present  some  numerical  results.  We  study  four  prob¬ 
lems:  buffer  overflow  for  a  tandem  Jackson  network  with  one  shared  buffer, 
simultaneous  buffer  overflow  for  a  tandem  Jackson  network  with  separate 
buffers  for  each  queue,  some  buffer  overflow  problems  for  a  simple  Markov 
modulated  queue  and  estimation  of  the  sample  mean  of  a  sequence  of  i.i.d. 
random  variables. 

Subsolutions,  even  among  those  with  the  maximal  value  at  a  given  point, 
are  not  unique,  and  indeed  for  the  problems  to  be  discussed  there  are  some¬ 
times  a  number  of  reasonable  choices  one  could  make.  We  will  not  give 
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any  details  of  the  proof  of  the  subsolution  property,  but  simply  note  that  in 
each  case  it  can  be  proved  by  a  direct  verification  argument  as  discussed  in 
Section  5. 


6.1  Tandem  Jackson  Network  -  Single  Shared  Buffer 

Consider  a  stable  tandem  Jackson  network  with  service  rates  A  <  min{/i1,  p2}. 
Suppose  that  the  two  queues  share  a  single  buffer  and  that  we  are  interested 
in  the  probability 

pn  =  P( o,o)  {total  population  reaches  n  before  first  return  to  (0,0)  } 

It  is  well  known  that 

lim - log pn  =  min{p| ,  p2} 

n— >oo  Ti 

where  pi  =  logy-.  Further,  the  (continuous  time)  Hamiltonian  that  corre¬ 
sponds  to  subsolutions  of  the  relevant  calculus  of  variations  problem  is 

H (p)  =  —  [A(e_Pl  -  1)  +  /i1(e&,1-pa>  -  1)  +  p2{eP2  -  1)]. 

(see  [3]  for  the  discrete  time  analogue).  Without  loss  of  generality  (see  [3]) 
one  can  assume  that  p2  5;  AH-  By  inspection  H (p)  =  0  for  p  =  —  log  y(l, 1) 
(this  root  is  suggested  by  the  form  of  the  escape  region),  and  W(x)  = 
(p,  x )  +  log  y  is  a  subsolution  which  in  fact  takes  the  maximal  value  at 
(0,  0)  and  so  leads  to  an  asymptotically  optimal  splitting  scheme.  The  table 
below  shows  the  results  of  a  splitting  simulation  with  20,000  runs  for  A  =  1, 
p1  =  p2  =  4.5  and  for  various  values  of  n. 


n 

30 

40 

50 

Theoretical  Value 

2.63  X  10-ls 

1.03  X  10”^4 

3.80  X  10-J1 

Estimate 

2.69  X  10-1B 

0.97  X  lO^4 

3.98  X  10-31 

Std.  Err. 

0.11  X  10-ls 

0.04  x  lO^4 

0.20  X  10-31 

95%  C.I. 

[2.48,  2.90]  X  10-ls 

[0.88,  1.05]  X  10-24 

[3.60,  4.37]  x  10-J1 

Time  Taken  (s) 

32 

67 

165 

Total  no.  successes 

471776 

579127 

810382 

Max  no.  particles 

4027 

4134 

6987 

Table  1.  A  =  1,  pl  =  p2  =  4.5,  asymptotically  optimal  scheme. 

It  was  noted  in  Remark  9  that  the  number  of  particles  generated  may 
grow  subexponentially  in  n  and  this  appears  to  be  reflected  in  the  data. 
Following  the  suggestion  of  the  remark,  we  also  considered  a  slightly  subop- 
tirnal  subsolution  in  the  hopes  of  better  controlling  the  number  of  particles 
with  little  loss  in  performance.  The  table  below  shows  the  results  of  numer¬ 
ical  simulation  for  the  same  problem  with  a  splitting  algorithm  based  on  the 
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subsolution  W(x)  =  W(x).  Again  each  estimate  is  obtained  using  20,000 

runs.  The  results  are  in  accord  with  our  expectations. 


n 

30 

40 

50 

Theoretical  Value 

2.63  X  10-ls 

1.03  X  10 ^ 

3.80  X  10-31 

Estimate 

2.80  X  10-1B 

1.10  X  lO-^4 

3.70  X  10-B1 

Std.  Err. 

0.14  X  10-1B 

0.08  X  10 

0.31  X  10-31 

95%  C.I. 

[2.51,  3.10]  X  10 ls 

[0.94,  1.26]  X  10“^4 

[3.09,  4.31]  X  10-31 

Time  Taken  (s) 

6 

10 

15 

Total  no.  successes 

66607 

44707 

25578 

Max  no.  particles 

806 

1145 

678 

Table  2.  A  =  1,  pi  =  p2  =  4.5,  asymptotically  suboptimal  scheme. 

6.2  Tandem  Jackson  Network  -  Separate  Buffers 

In  the  paper  [8]  the  authors  address  the  problem  of  asymptotic  optimality 
for  splitting  algorithms.  In  particular  they  consider  an  approach  to  choos¬ 
ing  level  sets  that  are  claimed  to  be  “consistent”  with  the  large  deviations 
analysis  and  show  that  this  does  not  always  lead  to  asymptotically  opti¬ 
mal  algorithms.  They  illustrate  their  results  by  considering,  for  a  tandem 
Jackson  network,  the  problem  of  simulating  the  probabilities 

pn  =  P(0,o)  {both  queues  simultaneously  exceed  n  before  first  return  to  (0,  0)}  . 

It  is  shown  that 

lirn - logpn  =  Pi  +  p2  =  7, 

n— >oc>  Ti 

and  the  authors  propose  a  splitting  algorithm  based  on  the  importance  func¬ 
tion  U(x )  =  7  —  yminjxi,  X2},  which  is  just  a  rescaling  of  the  target  set 
B  =  {(x,  y)  :  x  >  n  or  y  >  n}.  They  show  that  although  the  level  sets  given 
by  this  function  may  intuitively  seem  to  agree  with  the  most  likely  path  to 
the  rare  set  identified  by  the  large  deviations  analysis,  the  resulting  splitting 
algorithm  in  fact  has  very  poor  performance.  By  analyzing  this  importance 
function  using  the  subsolution  approach  it  is  very  easy  to  see  why  this  is 
the  case.  The  Hamiltonian  corresponding  to  subsolutions  is  the  same  as  in 
the  previous  section  and  it  is  clear  to  see  that  U(x)  is  not  a  subsolution. 
However,  the  function 


W(x)  =  7  -  p^i  -  p2x 2 

is  a  subsolution.  Further  JF(0)  =  7,  thus  the  corresponding  importance 
function  will  lead  to  an  asymptotically  optimal  splitting  algorithm.  Nu¬ 
merical  results  are  presented  for  the  cases  A  =  1,  Pi  =  3,/x2  =  2  and 
A  =  1,  Pi  =  2,/r2  =  3  which  are  the  same  rates  originally  considered  in 
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[8].  Each  estimate  was  obtained  by  a  simulation  using  20,000  runs. 


n 

10 

20 

30 

Theoretical  Value 

9.64  x  10”8 

1.60  X  10”18 

2.64  x  10”" 

Estimate 

9.61  X  10”8 

1.60  X  10”18 

2.50  X  10”" 

Std.  Err. 

0.17  X  10”8 

0.03  X  10”18 

0.06  X  10”" 

95%  C.I. 

[9.28,  9.94]  x  10”8 

[1.53,  1.66]  X  10 ls 

[2.38,  2.611  X  10”" 

Time  Taken  (s) 

17 

137 

454 

Total  no.  successes 

116255 

116918 

110337 

Max  no.  particles 

672 

2274 

3864 

Table  3.  A  =  1,  =  3,  p2  =  2,  asymptotically  optimal  scheme. 


n 

10 

20 

30 

Theoretical  Value 

9.64  X  10”B 

1.60  X  10”18 

2.64  X  10”" 

Estimate 

9.49  X  10”B 

1.67  X  IO”10 

2.76  X  10”" 

Std.  Err. 

0.27  X  10”8 

0.07  X  10”18 

0.14  X  10”" 

95%  C.I. 

[8.97,  10.0]  X  10”8 

[1.54,  1.80]  X  10”18 

[2.49,  3.02]  X  10”" 

Time  Taken  (s) 

14 

107 

368 

Total  no.  successes 

114763 

122184 

121847 

Max  no.  particles 

1470 

7302 

16050 

Table  4.  A  =  1,  =  2,  p2  =  3,  asymptotically  optimal  scheme. 

As  expected,  these  results  show  a  vast  improvement  over  those  obtained 
in  [8].  Finally  the  tables  below  show  the  results  of  numerical  simulation 
for  the  same  problem  with  a  splitting  algorithm  based  on  the  subsolution 
W(x)  =  0.95  W{x). 


n 

10 

20 

30 

Theoretical  Value 

9.64  X  10”8 

1.60  X  10”18 

2.64  X  10”" 

Estimate 

9.57  X  10”8 

1.55  X  10”18 

2.47  X  10”" 

Std.  Err. 

0.21  X  10”8 

0.05  X  10”18 

0.13  X  10”" 

95%  C.I. 

[9.17,  9.98]  X  10”8 

[1.45,  1.65]  X  10”18 

[2.22,  2.72]  X  10”" 

Time  Taken  (s) 

8 

38 

67 

Total  no.  successes 

47268 

18875 

7425 

Max  no.  particles 

433 

545 

703 

Table  5.  A  =  1,  =  3,  p2  =  2,  asymptotically  suboptimal  scheme. 


n 

10 

20 

30 

Theoretical  Value 

9.64  X  10”8 

1.60  X  10”18 

2.64  X  10”" 

Estimate 

9.32  x  10”8 

1.54  x  10”18 

2.56  X  10”" 

Std.  Err. 

0.30  X  10”8 

0.07  X  10”18 

0.20  X  10”" 

95%  C.I. 

[8.73,  9.92]  X  10”8 

[1.39,  1.70]  X  10”18 

[2.16,  2.96]  X  10”" 

Time  Taken  (s) 

7 

30 

57 

Total  no.  successes 

46037 

18859 

7710 

Max  no.  particles 

780 

1724 

1712 

Table  6.  A  =  1,  /r1  =  2,  p2  =  3,  asymptotically  suboptimal  scheme. 

The  choice  of  W (x)  =  7  —  PiX  1  —  p2x 2  as  importance  function  may  seem 
arbitrary,  however  it  turns  out  to  be  a  very  natural  choice.  Given  a  >  0 
consider  a  “nice”  set  B  such  that  for  the  importance  function  Wa{x)  = 
a  —  p\X\  —  p2X2,  B  n  {x  :  Wa(x)  >  0}  =  0  and  B  (1  {x  :  Wa(x)  =  0}  /  0. 
Then 

lim - logpn  =  a, 

n— ►  00  Tl 


30 


where 


pn  =  reaches  nB  before  first  return  to  (0,  0)). 

Intuitively  this  means  that  all  points  on  a  level  set  of  the  function  Wa(x ) 
have  the  same  asymptotic  probability.  Thus  given  any  such  nice  set  B  we 
can  identify  its  large  deviations  rate  by  finding  the  unique  a*  such  that 
B  D  {x  :  Wa*  (x)  >  0}  =  0  and  B  n  {x  :  Wa*  (x)  =  0}  /  0.  Further  Wa*  (x) 
will  be  an  asymptotically  optimal  importance  function. 

That  the  family  of  functions  Wa  has  such  a  property  is  because  the  sta¬ 
tionary  probabilities  for  a  stable  tandem  Jackson  network  have  the  product 
form  7r({i,j})  =  (1  —  p1)(l  —  /02 ) Pi /°2 ■  Indeed,  by  using  an  argument  based 
on  the  recurrence  theorem,  we  can  see  that  every  stable  tandem  Jackson 
network  has  a  family  of  affine  subsolutions  with  the  same  property.  Further 
this  will  be  true  for  any  IV-dimensional  queueing  network  for  which  the  sta¬ 
tionary  probabilities  tt  have  asymptotic  product  form,  by  which  we  mean 
that  there  exist  p1 , . . . ,  pN  such  that  for  any  nice  set  B 

lim  — —  log7r(raP)  =  inf {x\pl  H - h  xnPn  '■  (x i>  •  •  -,xn)  G  B}. 

n— >oo  77, 

6.3  Non-Markovian  Process 

Since  many  models  are  non-Markovian  we  present  an  example  of  splitting 
for  a  non-Markovian  process.  Consider  a  tandem  network  whose  arrival 
and  service  rates  are  modulated  by  an  underlying  process  Mt  which  takes 
values  in  the  set  {1,2},  such  that  the  times  taken  for  the  modulating  pro¬ 
cess  to  switch  states  are  independent  exponential  random  variables  with 
rate  7(1)  if  M  is  in  state  1  and  7(2)  otherwise.  Let  A(l),  ^(1), /x2(l)  and 
A(2),  /x1(2),  /x2(2)  be  the  service  rates  of  the  queue  in  the  first  and  second 
states  respectively.  It  is  known  (see,  e.g.,  [4])  that  the  Hamiltonian  can  be 
characterized  in  terms  of  the  solution  to  an  eigenvalue/eigenvector  problem 
parameterized  by  p.  This  characterization  is  used  for  calculating  the  various 
roots  to  H(p)  =  0  used  below. 

Consider  again  the  single  shared  buffer  problem.  Let  A(l)  =  1,//1(1)  = 
3.5,  ^(1)  =  2.5,  7(1)  =  0.2  and  A(2)  =  1,^(2)  =  4.5,  ^(2)  =  4.5,  7(2)  = 
0.5.  Using  a  verification  argument,  one  can  show  that  W(x)  =  1.00029(1  — 
x\  —  X2)  is  a  subsolution  with  the  maximal  value  VF(O).  Thus  using  W{x) 
leads  to  an  asymptotically  optimal  splitting  scheme.  The  results  of  simula¬ 
tions  run  using  this  importance  function  are  shown  below,  where  again  each 
estimate  was  derived  using  20,000  runs. 
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n 

30 

40 

50 

Theoretical  Value 

6.36  X  10-13 

2.88  X  10”1' 

1.30  X  10 al 

Estimate 

6.05  X  10-13 

2.91  x  10-1/ 

1.33  X  10 al 

Std.  Err. 

0.21  X  10 la 

0.11  X  10-1' 

0.06  X  10 al 

95%  C.I. 

[5.63,  6.47]  X  10~13 

[2.69,  3.13]  X  10”1' 

[1.21,  1.44]  x  10~al 

Time  Taken  (s) 

3 

5 

8 

Total  no.  successes 

47992 

51023 

51434 

Max  no.  particles 

195 

330 

469 

Table  7.  Markov-modulated  network,  total  population  overflow. 

It  is  also  worth  revisiting  the  separate  buffers  problem  for  the  same 
queueing  network.  For  the  same  arrival  and  service  rates  one  can  again  use 
a  verification  argument  to  show  that  W(x)  =  2.2771  —  1.2953xi  —  0.9818x2 
leads  to  an  asymptotically  optimal  splitting  scheme.  Results  of  a  simulation 
using  20,000  runs  are  shown  below. 


n 

10 

20 

30 

Theoretical  Value 

8.36  X  10-iu 

1.07  X  10~ia 

1.39  X  10~aa 

Estimate 

8.37  X  10-lu 

1.07  X  10 la 

1.44  X  10-aa 

Std.  Err. 

0.20  X  10-lu 

0.03  X  10 la 

0.05  X  10”aa 

95%  C.I. 

[7.99,  8.76]  X  10-lu 

[1.01,  1.13]  X  10 la 

[1.34,  1.54]  x  10-aa 

Time  Taken  (s) 

12 

89 

273 

Total  no.  successes 

129643 

128335 

133658 

Max  no.  particles 

1215 

5126 

8051 

Table  8.  Markov-modulated  network,  simultaneous  separate  buffer 

overflow. 

Finally  we  investigate  what  happens  in  this  case  if  we  use  a  strict  sub¬ 
solution  as  importance  function.  The  table  below  shows  the  results  of  a 
simulation  using  20,000  runs  based  on  the  importance  function  W  =  0.95 W. 


n 

10 

20 

30 

Theoretical  Value 

8.36  X  10 lu 

1.07  X  10 la 

1.39  X  10”aa 

Estimate 

8.33  X  10 lu 

1.14  x  10 la 

1.38  X  10-aa 

Std.  Err. 

0.21  X  10 lu 

0.04  X  10 la 

0.07  X  10-aa 

95%  C.I. 

[7.91,  8.74]  X  10"lu 

[1.06,  1.22]  X  10”la 

[1.25,  1.51]  X  10~aa 

Time  Taken  (s) 

8 

42 

91 

Total  no.  successes 

77221 

49032 

27527 

Max  no.  particles 

883 

2887 

3800 

Table  9.  Markov-modulated  network,  asymptotically  suboptimal  scheme. 


6.4  Rare  Events  for  the  Sample  Mean 

It  is  also  worth  noting  that  this  approach  works  just  as  well  for  finite 
time  problems.  Assume  that  Xi,X2,...  is  a  sequence  of  i.i.d.  N(0 ,IN) 
random  variables  where  IN  is  the  IV-dimensional  identity  matrix  and  let 
Sn  =  - l-  Ya= i  Suppose  that  we  are  interested  in  simulating  the  sequence 

of  probabilities 
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Pn  =  P{SneC} 

for  some  set  C  such  that  C  does  not  include  the  origin.  For  j  E  {1, . . . ,  n} 
let  Sn(j)  =  ^  1  Xf  Then  given  sequences  xn ,  jn  and  x  E  WN ,  t  E  [0, 1] 

such  that  limn_>0O  xn  =  x  and  lim^oo  jn/re  =  t,  the  large  deviations  result 

lim  — -logP{Sn  E  C\Sn(jn)  =  xn }  =  W(x,t) 

n— xx)  77, 

holds.  Further  the  PDE  corresponding  to  solutions  of  the  calculus  of  varia¬ 
tions  problem  is  (see  [5] ) 


Wt  +  infH(LW;/3)  =0, 

where  H(s;  (3)  =  (s,/3)  +  L(/3)  and  L(/3)  =  ||/3||2/2.  We  can  put  this  into 
the  general  framework  in  the  standard  way,  i.e. ,  by  considering  the  time 
variable  as  simply  another  state  variable.  The  set  B,  for  example,  is  then 
C  x  {1}.  Strictly  speaking  this  problem  does  not  satisfy  the  conditions  used 
previously,  since  the  sets  A  and  B  no  longer  have  disjoint  closure.  Although 
we  omit  the  details,  it  is  not  difficult  to  work  around  this  problem. 

It  is  easy  to  see  that  any  affine  function  of  the  form 

W(x,  t)  =  —  (a,  x)  +  || ct|| 2  —  (1  —  t)H(a), 

where  H(a)  =  ||a||2/2,  is  a  subsolution,  though  it  may  not  have  the  optimal 
value  at  (0,  0)  and  may  not  be  less  than  or  equal  to  zero  on  B.  We  can  use 
the  fact  that  the  minimum  of  a  collection  of  subsolutions  is  also  a  subsolution 
to  build  a  subsolution  which  satisfies  the  boundary  condition  and  has  the 
maximal  value  at  (0,  0).  For  example,  suppose  that  C  =  {x  E  M2  :  (pi,  x)  > 
1}  U  {x  E  M2  :  (p2,x)  >  1}  where  p\  =  (0.6,  0.8)  and  p2  =  (0.6, —0.8). 
Let  W\(x)  =  1  -  {pi,  x)  —  \{l  -  t),  W2(x)  =  1  -  ( p2 ,  x)  —  —  t ).  Then 

W  =  W\  A  W2  is  a  subsolution  and  in  fact  provides  an  asymptotically  optimal 
splitting  scheme  since  TT(0,0)  =  IF(0,0).  Numerical  results  are  shown 
below.  Each  estimate  was  derived  using  100,000  runs.  In  contrast  to  all 
the  previous  examples  where  the  process  evolves  on  a  grid,  the  simulated 
process  in  this  case  may  cross  more  than  one  splitting  threshold  in  a  single 
discrete  time  step.  This  appears  to  increase  the  variance  somewhat  (at  least 
if  the  straightforward  implementation  as  described  in  Section  2  is  used), 
and  hence  we  increased  the  number  of  runs  to  keep  the  relative  variances 
comparable. 
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n 

20 

30 

40 

Theoretical  Value 

7.75  X  10-B 

4.33  X  10-B 

2.54  X  10~lu 

Estimate 

7.36  X  10-H 

4.21  x  10-s 

2.46  X  10 lu 

Std.  Err. 

0.21  X  10 t> 

0.12  X  10-s 

0.08  X  10-lu 

95%  C.I. 

[6.94,  7.78]  X  10~° 

[3.97,  4.45)  X  10~B 

[2.31,  2.61]  X  10”lu 

Time  Taken  (s) 

15 

38 

85 

Total  no.  successes 

16209 

13756 

11934 

Max  no.  particles 

155 

379 

298 

Table  10.  Sample  mean  for  sums  of  iid. 
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